17 research outputs found

    Computing Covers Using Prefix Tables

    Get PDF
    An \emph{indeterminate string} x=x[1..n]x = x[1..n] on an alphabet Σ\Sigma is a sequence of nonempty subsets of Σ\Sigma; xx is said to be \emph{regular} if every subset is of size one. A proper substring uu of regular xx is said to be a \emph{cover} of xx iff for every i1..ni \in 1..n, an occurrence of uu in xx includes x[i]x[i]. The \emph{cover array} γ=γ[1..n]\gamma = \gamma[1..n] of xx is an integer array such that γ[i]\gamma[i] is the longest cover of x[1..i]x[1..i]. Fifteen years ago a complex, though nevertheless linear-time, algorithm was proposed to compute the cover array of regular xx based on prior computation of the border array of xx. In this paper we first describe a linear-time algorithm to compute the cover array of regular string xx based on the prefix table of xx. We then extend this result to indeterminate strings.Comment: 14 pages, 1 figur

    Inferring an Indeterminate String from a Prefix Graph

    Get PDF
    An \itbf{indeterminate string} (or, more simply, just a \itbf{string}) \s{x} = \s{x}[1..n] on an alphabet Σ\Sigma is a sequence of nonempty subsets of Σ\Sigma. We say that \s{x}[i_1] and \s{x}[i_2] \itbf{match} (written \s{x}[i_1] \match \s{x}[i_2]) if and only if \s{x}[i_1] \cap \s{x}[i_2] \ne \emptyset. A \itbf{feasible array} is an array \s{y} = \s{y}[1..n] of integers such that \s{y}[1] = n and for every i2..ni \in 2..n, \s{y}[i] \in 0..n\- i\+ 1. A \itbf{prefix table} of a string \s{x} is an array \s{\pi} = \s{\pi}[1..n] of integers such that, for every i1..ni \in 1..n, \s{\pi}[i] = j if and only if \s{x}[i..i\+ j\- 1] is the longest substring at position ii of \s{x} that matches a prefix of \s{x}. It is known from \cite{CRSW13} that every feasible array is a prefix table of some indetermintate string. A \itbf{prefix graph} \mathcal{P} = \mathcal{P}_{\s{y}} is a labelled simple graph whose structure is determined by a feasible array \s{y}. In this paper we show, given a feasible array \s{y}, how to use \mathcal{P}_{\s{y}} to construct a lexicographically least indeterminate string on a minimum alphabet whose prefix table \s{\pi} = \s{y}.Comment: 13 pages, 1 figur

    String Comparison in VV-Order: New Lexicographic Properties & On-line Applications

    Get PDF
    VV-order is a global order on strings related to Unique Maximal Factorization Families (UMFFs), which are themselves generalizations of Lyndon words. VV-order has recently been proposed as an alternative to lexicographical order in the computation of suffix arrays and in the suffix-sorting induced by the Burrows-Wheeler transform. Efficient VV-ordering of strings thus becomes a matter of considerable interest. In this paper we present new and surprising results on VV-order in strings, then go on to explore the algorithmic consequences

    Algorithms for Longest Common Abelian Factors

    Full text link
    In this paper we consider the problem of computing the longest common abelian factor (LCAF) between two given strings. We present a simple O(σ n2)O(\sigma~ n^2) time algorithm, where nn is the length of the strings and σ\sigma is the alphabet size, and a sub-quadratic running time solution for the binary string case, both having linear space requirement. Furthermore, we present a modified algorithm applying some interesting tricks and experimentally show that the resulting algorithm runs faster.Comment: 13 pages, 4 figure

    Absent words and the (dis)similarity analysis of DNA sequences:An experimental study

    Get PDF
    Additional file 1. All Distance Matrices. In this file (AllMatrices), all the distance matrices are provided

    Querying Highly Similar Structured Sequences via Binary Encoding and Word Level Operations

    No full text
    Part 8: First Workshop on Algorithms for Data and Text Mining in Bioinformatics (WADTMB 2012)International audienceIn the post-genomic era there has been an explosion in the amount of genomic data available and the primary research problems have moved from being able to produce interesting biological data to being able to efficiently process and store this information. In this paper we present efficient data structures and algorithms for the High Similarity Sequencing Problem. In the High Similarity Sequencing Problem we are given the sequences S0, S1, …, Sk where Sj = ej1Iσ1ej2Iσ2ej3Iσ3,,ejIσe_{j_1} I_{\sigma_1}e_{j_2} I_{\sigma_2} e_{j_3} I_{\sigma_3}, \dots,e_{j_\ell} I_{\sigma_\ell} and must perform pattern matching on the set of sequences. In this paper we present time and memory efficient datastructures by exploiting their extensive similarity, our solution leads to a query time of O(m+vklog+moccvvw+PSC(p)mw)O(m + vk \log \ell + \frac{m occ_v v}{w} + \frac{PSC(p)m}{w}) with a memory usage of O(N logN + vk logvk)

    Simple linear comparison of strings in V-order

    No full text
    In this paper we focus on a total (but non-lexicographic) ordering of strings called V-order. We devise a new linear-time algorithm for computing the V-comparison of two finite strings. In comparison with the previous algorithm in the literature, our algorithm is both conceptually simpler, based on recording letter positions in increasing order, and more straightforward to implement, requiring only linked lists
    corecore